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McCulloch & Pitts Neuron and Perceptron 
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Table 1: Error rates of Various Learning Algorithms on the MNIST Digit Rc 
nit ion Task. 
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L2 norm 
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transformations 

Unpermuted images 
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Table 1: Neta, Netb, and Netc were greedily pretrained on different, unlabeled, subsets of 
the training data that were obtained by removing disjoint validation sets of 10,000 images. 
After pretraining, they were trained on those same subsets using backpropagation. Then 
the training was continued on the full training set until the cross-entropy error reached 
the criterion explained in the text. 
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Table 1: Test error on networks of deptli 3, Bold 
results represent statistical equivalence between similar ex- 
ixrimente, with and without pre-training, under the null 
hypothesis of the pairwise lest with p — 0.(15. 
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(a) Standard Neural Net (b) After applying dropout. 


Figure 1: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right: 
An example of a thinned net produced by applying dropout to the network on the left. 
Crossed units have l>eeu dropped. 


Dropout 



(a) At training time (b) At test time 


Figure 2: Left: A unit at training time that is present with probability p and is connected to units 
in the next layer with weights w. Right: At test time, the unit is always present and 
the weights are multiplied by p. The output at test time is same as the expected output 
at training time. 


Dropout 

2012 


0.5 maanona paaz *?a ’zar pua apma n 1 ?® ^aa • 

.(nanx niaanon m in 1 !) 

.D’aaaaan nx anaiyai Backprop arraa • 

.•p^nnn bv aw nnnm laoirw D'orvun on n’anna • 

nfnpwan ^aai nana’un *7aa nwarwa nnaan a’wa • 
nrianona 0.5a rfripwa *?a a^’aaa^ nnx rwan 

.rimin'? 


lanaan m praa nawna paan pra rwan 
.a^na 2 A H n ^aa n’arnn naaix^n 




















nxnwn 

Dropout vs Without 
Dropout 



0 200000 400000 600000 800000 1000000 

Number of weight updates 


Figure 4: Test error for different architectures 
with and without dropout. The net¬ 
works have 2 to 4 hidden layers each 
with 1024 to 2048 units. 


6.1.1 MNIST 


Method 

Unit 

Type 

Architecture 

Error 

% 

Standard Neural Net (Simard et al., 2003) 

Logistic 

2 layers, 800 units 

1.60 

SVM Gaussian kernel 

NA 

NA 

1.40 

Dropout NN 

Logistic 

3 layers, 1024 units 

1.35 

Dropout NN 

ReLU 

3 layers. 1024 units 

1.25 

Dropout NN 4 max-norm constraint 

ReLU 

3 layers, 1024 units 

1.06 

Dropout NN 4 max-norm constraint 

ReLU 

3 layers. 2048 units 

1.04 

Dropout NN 4 max-norm constraint 

ReLU 

2 layers, 4096 units 

1.01 

Dropout NN 4 max-norm constraint 

ReLU 

2 layers, 8192 units 

0.95 

Dropout NN 4 max-norm constraint (Goodfellow 

Maxout 

2 layers, (5 x 240) 

0.94 

et al., 2013) 

units 

DBN 4 finetuning (Hinton and Salakhutdinov. 2006) 

Logistic 

500-500-2000 

1.18 

DBM 4 finetuning (Salakhutdinov and Hinton, 2009) 

Logistic 

500-500-2000 

0.96 

DBN -1- dropout finetuning 

Logistic 

500-500-2000 

0.92 

DBM + dropout finetuning 

Logistic 

500-500-2000 

0.79 


Table 2: Comparison of different models on MNIST. 
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Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. 
(2014). Dropout: a simple way to prevent neural networks from 
overfitting. Journal of Machine Learning Research, 15(1), 1929-1958 
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Figure 5. Example validation images successfully classified by our 
method. For each image, the ground-truth label and the top-5 la¬ 
bels predicted by our method are listed. 
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